Applications and statistics for multiple high-scoring segments in molecular sequences.

نویسندگان

  • S Karlin
  • S F Altschul
چکیده

Score-based measures of molecular-sequence features provide versatile aids for the study of proteins and DNA. They are used by many sequence data base search programs, as well as for identifying distinctive properties of single sequences. For any such measure, it is important to know what can be expected to occur purely by chance. The statistical distribution of high-scoring segments has been described elsewhere. However, molecular sequences will frequently yield several high-scoring segments for which some combined assessment is in order. This paper describes the statistical distribution for the sum of the scores of multiple high-scoring segments and illustrates its application to the identification of possible transmembrane segments and the evaluation of sequence similarity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

An unusual pattern in a nucleic acid or protein sequence or a region of strong similarity shared by two or more sequences may have biological significance. It is therefore desirable to know whether such a pattern can have arisen simply by chance. To identify interesting sequence patterns, appropriate scoring values can be assigned to the individual residues of a single sequence or to sets of re...

متن کامل

An Evolutionary and Phylogenetic Study of the BMP15 Gene

DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...

متن کامل

A Distribution Function Arising in Computational Biology

Karlin and Altschul in their statistical analysis for multiple highscoring segments in molecular sequences introduced a distribution function which gives the probability there are at least r distinct and consistently ordered segment pairs all with score at least x. For long sequences this distribution can be expressed in terms of the distribution of the length of the longest increasing subseque...

متن کامل

Molecular analysis of AbOmpA type-1 as immunogenic target for therapeutic interventions against MDR Acinetobacter baumannii infection

Introduction: Acinetobacter baumannii is associated with hospital-acquired infections. Outer membrane protein A of A.baumannii (AbOmpA) is a well-characterized virulence factor which has important roles in pathogenesis of this bacterium. Methods: Based on our PCR-sequencing of ompA gene in the clinical isolates, AbOmpA protein can be categorized into two types, named here type-1 and type-2. We ...

متن کامل

A multiple sequence alignment program

A program is described for simultaneously aligning two or more molecular sequences which is based on first finding common segments above a specified length and then piecing these together to maximize an alignment scoring function. Optimal as well as near-optimal alignments are found, and there is also provided a means for randomizing the given sequences for testing the statistical significance ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 90 12  شماره 

صفحات  -

تاریخ انتشار 1993